Rank | Count | Beginning |
---|---|---|
5699 | 5588 | اس |
23272 | 3579 | انہوں |
61 | 2343 | یہ |
19600 | 2061 | ان |
35360 | 1403 | پاکستان |
15084 | 1273 | ایک |
94656 | 1083 | وہ |
28009 | 837 | اور |
753 | 786 | اب |
16643 | 665 | اگر |
46608 | 622 | جس |
40232 | 562 | پولیس |
45457 | 555 | جب |
82053 | 551 | لیکن |
10946 | 546 | اسلام |
95022 | 503 | واضح |
97029 | 500 | وزیر |
76182 | 498 | کراچی |
91049 | 485 | میں |
97049 | 460 | وزیراعظم |
83125 | 458 | ہم |
30155 | 452 | بھارت |
56275 | 423 | دوسری |
30161 | 409 | بھارتی |
70737 | 409 | عمران |
37978 | 407 | پھر |
5779 | 398 | اسی |
48807 | 394 | جو |
52404 | 390 | حکومت |
23283 | 389 | انھوں |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV